Search CORE

238 research outputs found

Computing patient data in the cloud: practical and legal considerations for genetics and genomics research in Europe and internationally

Author: Korbel Jan O.
Lück Rupert
Molnár-Gábor Fruzsina
Yakneen Sergei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Biomedical research is becoming increasingly large-scale and international. Cloud computing enables the comprehensive integration of genomic and clinical data, and the global sharing and collaborative processing of these data within a flexibly scalable infrastructure. Clouds offer novel research opportunities in genomics, as they facilitate cohort studies to be carried out at unprecedented scale, and they enable computer processing with superior pace and throughput, allowing researchers to address questions that could not be addressed by studies using limited cohorts. A well-developed example of such research is the Pan-Cancer Analysis of Whole Genomes project, which involves the analysis of petabyte-scale genomic datasets from research centers in different locations or countries and different jurisdictions. Aside from the tremendous opportunities, there are also concerns regarding the utilization of clouds; these concerns pertain to perceived limitations in data security and protection, and the need for due consideration of the rights of patient donors and research participants. Furthermore, the increased outsourcing of information technology impedes the ability of researchers to act within the realm of existing local regulations owing to fundamental differences in the understanding of the right to data protection in various legal systems. In this Opinion article, we address the current opportunities and limitations of cloud computing and highlight the responsible use of federated and hybrid clouds that are set up between public and private partners as an adequate solution for genetics and genomics research in Europe, and under certain conditions between Europe and international partners. This approach could represent a sensible middle ground between fragmented individual solutions and a “one-size-fits-all” approach

Heidelberger Dokumentenserver

Directory of Open Access Journals

Prediction of effective genome size in metagenomic samples

Author: Bork Peer
Korbel Jan O
Lercher Martin J
Raes Jeroen
von Mering Christian
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

We introduce a novel computational approach to predict effective genome size (EGS; a measure that includes multiple plasmid copies, inserted sequences, and associated phages and viruses) from short sequencing reads of environmental genomics (or metagenomics) projects. We observe considerable EGS differences between environments and link this with ecologic complexity as well as species composition (for instance, the presence of eukaryotes). For example, we estimate EGS in a complex, organism-dense farm soil sample at about 6.3 megabases (Mb) whereas that of the bacteria therein is only 4.7 Mb; for bacteria in a nutrient-poor, organism-sparse ocean surface water sample, EGS is as low as 1.6 Mb. The method also permits evaluation of completion status and assembly bias in single-genome sequencing projects

Springer - Publisher Connector

PubMed Central

ZORA

MDC Repository

VISOR: a versatile haplotype-aware structural variant simulator for short- and long-read sequencing.

Author: Alberto Magi
Ashley D. Sanders
Davide Bolognini
Davide Bolognini
Jan O. Korbel
Tobias Rausch
Vladimir Benes
Publication venue
Publication date: 07/10/2019
Field of study

Abstract Summary VISOR is a tool for haplotype-specific simulations of simple and complex structural variants (SVs). The method is applicable to haploid, diploid or higher ploidy simulations for bulk or single-cell sequencing data. SVs are implanted into FASTA haplotypes at single-basepair resolution, optionally with nearby single-nucleotide variants. Short or long reads are drawn at random from these haplotypes using standard error profiles. Double- or single-stranded data can be simulated and VISOR supports the generation of haplotype-tagged BAM files. The tool further includes methods to interactively visualize simulated variants in single-stranded data. The versatility of VISOR is unmet by comparable tools and it lays the foundation to simulate haplotype-resolved cancer heterogeneity data in bulk or at single-cell resolution. Availability and implementation VISOR is implemented in python 3.6, open-source and freely available at https://github.com/davidebolo1993/VISOR. Documentation is available at https://davidebolo1993.github.io/visordoc/. Supplementary information Supplementary data are available at Bioinformatics online

Open Access Repository

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data

Author: Abyzov Alexej
Carriero Nicholas
Cayting Philip
Gerstein Mark B
Korbel Jan O
Mu Xinmeng Jasmine
Snyder Michael
Zhang Zhengdong
Publication venue: BioMed Central
Publication date: 23/02/2009
Field of study

Paired-End Mapper (PEMer) enables mapping of genomic structural variants at considerably enhanced sensitivity, specificity and resolution over previous approaches

Springer - Publisher Connector

PubMed Central

Systematic Association of Genes to Phenotypes by Genome and Literature Mining

Author: Andrade Miguel A
Bork Peer
Doerks Tobias
Hooper Sean D
Jensen Lars J
Kaczanowski Szymon
Korbel Jan O
Perez-Iratxeta Carolina
Publication venue: Public Library of Science
Publication date: 05/04/2005
Field of study

One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

FigShare

Butler enables rapid cloud-based analysis of thousands of human genomes.

Author: Gertz Michael
Korbel Jan O
PCAWG Consortium
PCAWG Technical Working Group
Waszak Sebastian M
Yakneen Sergei
Publication venue: Nat Biotechnol
Publication date: 01/01/2020
Field of study

We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner

OPUS Augsburg

Lund University Publications

eScholarship - University of California

Apollo (Cambridge)

Dense and accurate whole-chromosome haplotyping of individual genomes

Author: Garg Shilpa
Guryev Victor
Korbel Jan O.
Lansdorp Peter M.
Marschall Tobias
Porubsky David
Sanders Ashley D.
Publication venue
Publication date: 01/01/2017
Field of study

The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

MDC Repository

MPG.PuRe

Dissertations of the University of Groningen

High-Resolution Copy-Number Variation Map Reflects Human Olfactory Receptor Diversity and Evolution

Author: A Chess
A Keller
AE Urban
AJ Iafrate
Alexander Eckehart Urban
AN Gilbert
B Conrad
CJ Wysocki
CJ Wysocki
CL Beites
Claudia Gonzaga-Jauregui
D Nguyen
DA Wheeler
Doron Lancet
E Tuzun
EV Linardopoulou
G Glusman
GH Perry
H Rahil
Harry Orr
I Menashe
I Menashe
I Menashe
I Ovcharenko
J Amoore
J Purroy
JA Bailey
Jan O. Korbel
JC Venter
JM Kidd
JM Young
JO Korbel
JO Korbel
JO Korbel
JS Beckmann
L Buck
L Feuk
M Nozawa
Mark B. Gerstein
MB Kambere
Michael Snyder
Miriam Khen
Philip M. Kim
R Gross-Isseroff
R Redon
S Levy
S Serizawa
T Hummel
T Olender
Tsviya Olender
WJ Kent
Yehudit Hasin
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction (∼55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that ∼50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central